Main manuscript

Table 1. Characteristics of the household contacts of index cholera cases in Dhaka, Bangladesh

Total
Infected
Category Group N % N %
Sex Female 134 51.3 40 29.9
Male 127 48.7 37 29.1
Age group 0-4 4 1.5 4 100.0
5-17 74 28.4 25 33.8
18-49 164 62.8 45 27.4
50-71 19 7.3 3 15.8
Self-reported history of cholera No 251 96.2 76 30.3
Yes 10 3.8 1 10.0
Vibriocidal titer >=320 upon enrollment No 226 86.6 74 32.7
Yes 35 13.4 3 8.6
Index cholera case received antibiotics No 181 69.3 60 33.1
Yes 80 30.7 17 21.2
Relationship to index cholera case Child 48 18.4 18 37.5
Sibling 47 18.0 15 31.9
Spouse 61 23.4 15 24.6
Parent 101 38.7 29 28.7
Diarrhea duration in index case 0-6 days 67 25.7 17 25.4
7-20 days 92 35.2 23 25.0
14-20 days 38 14.6 12 31.6
21-73 days 64 24.5 25 39.1

Figure 1. Dynamics of functional antibody responses following V. cholerae infection

Antibody-dependent complement deposition (ADCD), cellular phagocytosis (ADCP), and neutrophil phagocytosis (ADNP) in serum from 24 index cholera cases upon enrollment (i.e., Day 2 post presumed symptom onset) and days 7 and 30 post onset. Each point represents the geometric mean of complement deposition and phagocytic scores, respectively, across three replicates. Boxplots show 50% (median), 25%, and 75% quantiles; whiskers represent approximate 95% confidence intervals for comparing medians (McGill et al. (1978)).

Figure 2. Risk of infection in household contacts of an index cholera case

Odds ratio of becoming infected among household contacts of an index cholera case for every 2-fold increase in baseline (Day 2) antibody titers, after adjusting for age and household clustering. Mean and 95% confidence interval are shown for each biomarker analyzed independently. Arrows indicate where upper confidence interval extends beyond the graph. Isotype is indicated on the left and antigen on the right for binding antibody titers. Biomarkers for which logistic model fitted values were very close to 1 are excluded.

## Dropped the following vars with logistic model fitted values very close to 1:
## IgG2OgOSP_Lx
## IgG2InOSP_Lx
## IgA2OgOSP_Lx
## IgA2InOSP_Lx

Figure 3. Biomarkers important for classifying household contacts by infection outcome

Top 15 biomarkers important in classifying household contacts of index cholera cases as infected (i.e., becoming stool positive) vs. uninfected (i.e., remaining stool negative). Biomarkers are ranked by importance scores calculated using conditional random forest classification models.

Figure 4. Predicting infection outcome among household contacts using different subsets of biomarkers

A) Cross-validated receiver operator curves (cvROC) for classifying household contacts of index cholera cases that remain uninfected vs. become infected using random forest models with different subsets of biomarkers and age. “5 biomarkers” corresponds to five of the top biomarkers selected via conditional importance, including ADCD, CtxB IgM, TcpA IgG2, Ogawa-OSP IgG1, and Sialidase IgG1. True and false positive rates calculated using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.

Table 2. Characteristics of North American volunteers vaccinated and then challenged with V. cholerae

Characteristic Category N No qualifying diarrhea Mild Moderate Severe
Challenge 11 34 29 (85.3%) 3 (8.8%) 1 (2.9%) 1 (2.9%)
91 33 18 (54.5%) 11 (33.3%) 2 (6.1%) 2 (6.1%)
Age group 18-25 15 12 (80%) 3 (20%) 0 (0%) 0 (0%)
26-35 34 24 (70.6%) 7 (20.6%) 1 (2.9%) 2 (5.9%)
36-45 18 11 (61.1%) 4 (22.2%) 2 (11.1%) 1 (5.6%)
Sex Female 16 11 (68.8%) 3 (18.8%) 0 (0%) 2 (12.5%)
Male 51 36 (70.6%) 11 (21.6%) 3 (5.9%) 1 (2%)

Figure 5. Biomarkers important for classifying vaccinees by whether or not they developed diarrhea following V. cholerae challenge

Top 15 biomarkers important in classifying vaccinees by developing either no qualifying diarrhea or mild to severe diarrhea following V. cholerae challenge. Biomarkers were selected using serum collected from participants on the day of challenge. Biomarkers are ranked by importance scores calculated using conditional random forest classification models.

Figure 6. Predicting whether or not vaccinees develop diarrhea using different subsets of biomarkers

A) Cross-validated receiver operator curves (cvROC) for classifying vaccinees that develop diarrhea vs. those that do not using random forest models with different subsets of biomarkers and age. “5 biomarkers” corresponds to five of the top biomarkers selected via conditional importance, including CT-HT IgA, CtxB IgA, CT-HT IgA2, TcpA IgA, and Sialidase IgA2. True and false positive rates calcualted using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.

Supplementary materials

Figure S1. Antibody titers upon enrollment among household contacts of index cholera cases

Functional and antigen-isotype-specific binding antibody responses among household contacts in the case-ascertainment studies that remained uninfected (green), that became infected but didn’t develop symptoms (purple), and that became infected and developed symptoms (red). Each point represents the geometric mean of antibody titers across 2-3 replicates for each contact upon enrollment (Day 2). Boxplots show 50% (median), 25%, and 75% quantiles; whiskers represent approximate 95% confidence intervals for comparing medians (McGill et al. (1978)). Pink * indicates differences between infected and uninfected were significant after adjusting for age and household clustering (also see Figure 2). Pink ^ indicates differences between infected contacts that developed symptoms and that remained asymptomatic were significant after adjusting for age and household clustering (also see Figure S2).

Figure S2. Risk of developing symptoms among infected household contacts

Odds ratio of symptoms among infected household contacts (n=77) for every 2-fold increase in baseline (Day 2) antibody titers, after adjusting for age and household clustering. Mean and 95% confidence interval are shown for each biomarker analyzed independently. Arrows indicate where upper confidence interval extends beyond the graph. Isotype is indicated on the left and antigen on the right for binding antibody titers. Biomarkers for which logistic model fitted values were very close to 1 are excluded.

## Dropped the following vars with logistic model fitted values very close to 1:
## IgGtotalInOSP_Lx
## IgG1InOSP_Lx
## IgG2TcpA_Lx
## IgG2OgOSP_Lx
## IgG2InOSP_Lx
## IgG4InOSP_Lx
## IgAtotalInOSP_Lx
## IgA1InOSP_Lx
## IgA2TcpA_Lx
## IgA2OgOSP_Lx
## IgA2InOSP_Lx

Figure S3. Unsupervised analysis of biomarkers in household contacts of index cholera cases upon enrollment

Generalized principal components analysis (GLM-PCA) plot for dimension reduction of the antibody responses in household contacts of index cholera cases upon enrollment (Day 2) (Townes et al. 2019). The first two components are shown, labelled by individual outcome.

## Dropping 0/261 (0%) of participants due to missing values in at least 1 biomarker

Figure S4. Pair-wise correlations between biomarkers in household contacts

Pair-wise Spearman rho coefficients for serum biomarkers among household contacts of cholera cases upon enrollment. Significant (p < 0.05) correlations are shown by an open box and non-significant correlations are marked with an X.

Figure S5. Predicting whether or not household contacts remain uninfected using an ensemble of machine learning models

A) Cross-validated receiver operator curves (cvROC) for classifying household contacts of index cholera cases that remain uninfected vs. become infected using an ensemble of three machine-learning models: random forest, penalized logistic regression, and support vector machine. Models were run with different subsets of biomarkers and age. “5 biomarkers” corresponds to five of the top biomarkers selected via conditional importance, including ADCD, CtxB IgM, CT-HT IgG1, Ogawa-OSP IgG1, TcpA IgG2. True and false positive rates calculated using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.

Figure S6. Unsupervised analysis of biomarkers in vaccinees upon V. cholerae challenge

Generalized principal components analysis (GLM-PCA) plot for dimension reduction of the antibody responses in volunteers that received a live attenuated oral cholera vaccine immediated prior to challenge with V. cholerae (Townes et al. 2019). The first two components are shown, labelled by individual outcome. Results are shown for antibody responses on the day of challenge.

## Dropping 1/67 (1.5%) of participants due to missing values in at least 1 biomarker

Figure S7. Pair-wise correlations between biomarkers in vaccinees

Pair-wise Spearman rho coefficients for serum biomarkers among vaccinees immediately prior to V. cholerae challenge. Significant (p < 0.05) correlations are shown by an open box and non-significant correlations are marked with an X.

Figure S8. Biomarkers important for classifying household contacts ages 18-48 by infection outcome

Top 15 biomarkers important in classifying household contacts aged 18-48 years old as infected (i.e., becoming stool positive) vs. uninfected (i.e., remaining stool negative). Biomarkers are ranked by importance scores calculated using conditional random forest classification models.

Figure S9. Biomarkers important for classifying vaccinees by outcome following challenge on either day 10 or day 90

Top 15 biomarkers important in classifying vaccinees by developing either no qualifying diarrhea or mild to severe diarrhea following V. cholerae challenge on day 10 A) or day 90 B). Biomarkers were selected using serum collected from participants on the day of challenge. Biomarkers are ranked by importance scores calculated using conditional random forest classification models.

A)

B)

Figure S10. Predicting whether or not vaccinees develop diarrhea following challenge on either day 10 or day 90

A) Cross-validated receiver operator curves (cvROC) for classifying vaccinees that develop diarrhea vs. those that do not using random forest models with different subsets of biomarkers and age. “5 biomarkers” corresponds to five of the top biomarkers selected via conditional importance. For Day 10, “5 biomarkers” included vibriocidal titer, ADNP, ADCD, Sialidase IgA, and Sialidase IgA1. For Day 90, “5 biomarkers” included CT-HT IgA2, Inaba-OSP IgA2, CtxB IgA, CT-HT IgA, and CtxB IgM. True and false positive rates calcualted using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.

Figure S11. Predicting whether or not vaccinees develop diarrhea using fold increase in vibriocidal titers

A) Cross-validated receiver operator curves (cvROC) for classifying vaccinees that develop diarrhea vs. those that do not using conditional random forest models with different vibriocidal titer measurements and age. True and false positive rates calcualted using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.

Figure S12. Predicting whether or not vaccinees develop diarrhea using an ensemble of machine learning models

A) Cross-validated receiver operator curves (cvROC) for classifying vaccinees that develop diarrhea vs. those that do not using an ensemble of three machine-learning models: random forest, penalized logistic regression, and support vector machine. Models were run with different subsets of biomarkers measured in serum on the day of challenge and age. “5 biomarkers” corresponds to five of the top biomarkers selected via conditional importance, including CT-HT IgA, CT-HT IgA2, CtxB IgA, TcpA IgA, and Inaba-OSP IgG3. True and false positive rates calcualted using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.

Figure S13. Predicting infection outcome among household contacts using models fit with vaccinee data and vice versa

A) Cross-prediction cross-validated receiver operator curves (cvROC). “Contacts predicted with vaccinees” indicates predicting whether household contacts remained uninfected vs. became infected using the “5 biomarkers” random forest model fit with the vaccinee data (biomarkers included ADCD, CtxB IgM, TcpA IgG2, Ogawa-OSP IgG1, and Sialidase IgG1). “Vaccinees predicted with contacts” indicates predicting whether or not vaccinees develop diarrhea using the “5 biomarkers” random forest model fit with the household contacts data (biomarkers included CT-HT IgA, CtxB IgA, CT-HT IgA2, TcpA IgA, and Sialidase IgA2). True and false positive rates calculated using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.

Figure S14. Comparison of original vibriocidal titer responses to new vibriocidal titers in household contacts of index cholera cases

Vibriocidal antibody titers for the subset (n = 252) of samples for which there was sufficient sample remaining to re-run the vibriocidal antibody assays. Purple line shows a 1:1 relationship and the size of each point corresponds to the number of samples that fell at each point on the graph.

Figure S15. Predicting stool culture status using old and new vibriocidal titers from the household contacts of index cholera cases

A) Cross-validated receiver operator curves (cvROC) for classifying household contacts as uninfected vs. infected based on old and new vibriocidal antibody titers and age. Analysis includes the subset (n = 252) of samples for which there was sufficient sample remaining to re-run the vibriocidal antibody assays. True and false positive rates calcualted using leave-one-out cross-validation. B) Cross-validated area under the curve (cvAUC) corresponding to the models in A. Influence-curve based 95% confidence intervals are shown for the cvAUC estimates.